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ABSTRACT 


This  report  is  concerned  with  methods  of  approximating  the  chance- 
constrained  set  S  =  (x|Pr[A  x  £B]  >_  a}  when  the  underlying  distribution, 
F(*)  of  the  random  variate  (A,  B)  is  non-normal.  The  resulting  sets  are 
completely  distribution-free  in  that  no  assumptions  are  made  about  the 
form  of  F(.)  or  any  of  its  parameters. 

The  concept  employed  is  the  distribution-free  tolerance  region.  This 
is  a  sample  based  region  containing  100a  percent  of  the  population,  at 
a  confidence  level,  e.  The  elements  of  the  distribution-free  sets  satisfy 
the  chance-constraint,  Pr[A  x.<_B]  >_  a  with  a  confidence  of  at  least  6. 
Furthermore,  the  sample  size  required  to  attain  this  level  of  confidence 
is  readily  available  in  tabular  or  graphical  form.  The  superiority  of  the 
distribution-free  approach  over  existing  chance-constrained  methods  is 
demonstrated  using  simulated  gamma  variates. 
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CHAPTEI  I 


INTRODUCTION 

Consider  the  linear  programming  problem  of  the  form 
maximize  Z  =  C  x 
subject  to 

Ai  *  £  ®i  i  *  1,...,  q  (1.1) 

x  >  0 

X  is  an  n-dimensional  column  vector,  and  and  C  are  n- 
dimensional  row  vectors.  In  real-world  problems  the  ele¬ 
ments  of  C,  B  and  A^  may  be  random  variables  and  in  such  a 
case  the  above  formulation  (1.1)  has  no  meaning.  The  ran¬ 
dom  variable  Z  cannot  be  maximized  and  must  be  replaced 
with  some  deterministic  function.  The  most  widely  used 
function  is  the  expected  value  of  Z,  although  other  choices 
have  been  suggested  in  the  literature  [1,2,3].  This  re¬ 
search  is  concerned  only  with  random  variation  in  the  con¬ 
straints.  In  particular,  the  chance-constrained  formula¬ 
tion  originally  proposed  by  Charncs  et_al.  [4]  is  con¬ 
sidered,  For  a  review  of  other  possible  reformulations  of 
linear  programming  problems  subject  to  random  variation, 
the  reader  is  refer. cd  to  the  survey  paper  by  McQurllan  [5] 


1 


2 


Chance -Constraints 

*  » 

In  chance-constrained  programming  it  is  not  required 
that  the  constraints  always  be  satisfied,  but  rather  that 
they  be  satisfied  with  given  probabilities.  More  precisely, 
the  chance-constrained  reformulation  of  (1.1)  associates 
with  each  constraint  a  preassigned  number  0  <  a.  <  1, 
i  »  1, ,  q  such  that  ?r[A^  x  <_  B^]  >  c^,  i  ■  1, . . . ,  q. 

The  corresponding  feasible  solution  set  is  then  given  by 

S  c  {xJPrlL^x)  <  0]  >  ai,  i  =  1,...,  q;  x  >  0}  (1.2) 

where 

Li  (x)  *  Aa  x  -  BjL  i  -  1 , . . . ,  q 

It  is  desired  to  convert  S  into  a  form  more  amenable  to 
existing  mathematical  programming  techniques.  The  method 
of  conversion  suggested  by  Charnes  [2]  yields  the  equiva¬ 
lent  form 

SQ  "  {xjElL^x))  ♦  Kq  o[Li(x)]  <0,  i  -  1,...,  q,  x  >  0 } 

(1.3) 

where  E(Li(x)]  and  o[Li(x)3  denote  the  expected  value  and 

standard  deviation  of  L . (x) ,  respectively.  K  is  the 

°i 

smallest  number  satisfying 
PrtT^x)  <  K  ]  >  «. 
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where  T. (x)  is  the  standardized  variate  of  L. (x) .  (K 
1  *  1  o  • 

•  • 

is  often  referred  to  as  the  quantile  of  order  cu.)  When 
Kq  ^  0,  it  can  be  shown  [6]  that  Sq  is  convex.  In  such  a 
case  ary  one  of  a  number  of  convex  programming  algorithms 
could  be  used  to  solve  the  resulting  problem. 

The  above  approach,  henceforth  called  the  Quantile 


Method,  is  limited  to  a  special  class  of  distributions  which 
are  referred  to  as  "stable”  [7].  The  common  property  of 
this  class  is  that  the  distributions  are  completely  speci¬ 
fied  by  two  parameters  U  and  V,  and  the  convolution  of  any 
K  distributions  F[(x  -  U1)/V13  >  ...»  F[(x  -  uk)/vk^  is 
again  of  the  form  F[(x  -  U)/V],  One  such  distribution  be¬ 
longing  to  this  class  is  the  normal,  thus  giving  the  Quan¬ 
tile  Method  seme  appeal.  However,  many  times  the  ele¬ 
ments  of  and  B  are  not  normal.  For  example,  the  elements 
of  may  represent  rates  which  have  to  be  non-negative.  In 
such  cases  alternative  approaches  [8,9}  have  been  proposed 
for  obtaining  convex  solution  sets  which  approximate  the 
set  Sq.  The  most  general  procedure  is  given  by  Sinha  [9], 
in  that  only  the  means,  variances  and  covariances  of  the 
random  variables  need  be  specified.  Using  the  Tchebysheff 
Extended  Lemma  [10],  it  is  shown  that  Sq  contains  the  con¬ 
vex  set 


ST  -  (xlElL^x)  + 


i  *  1,...,  q;  x  >  0} 


.  o^Cx)]  £  0, 


(1.4) 
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This  method  of  conversion  shall  henceforth  be  referred  to 
as  the  Tchcbysheff  Method. 

Motivation  and  Objective  of  this  Research 

Although  the  Tchebysheff  Method  makes  possible  the 
solution  of  chance-constrained  programs  under  non-normal 
conditipns,  there  still  exists  a  reliance  upon  parameters 
of  the  underlying  distribution.  In  real  '.'orld  situations, 
the  values  associated  with  these  parameters  are  estimates 
derived  from  random  samples.  The  accuracy  of  these  esti¬ 
mates  can  be  measured  in  terms  of  levels  of  significance 
or  degrees  of  confidence,  but  there  is  no  way  of  directly 
incorporating  these  measures  into  the  set  S^.  Thus,  the 
effect  of  bad  estimates  upon  the  solution  obtained  using 
the  Tchebysheff  Method  cannot  be  ascertained.  A  similar 
situation  also  occurs  with  the  Quantile  Method  when  normal¬ 
ity  assumptions  are  sample  based. 

The  above  discussion  suggests  a  need  for  a  more  gener¬ 
alized  theory  and  method  of  solving  chancc-constrained 
linear  programming  problems  when  random  sampling  is  neces¬ 
sary.  The  most  general  would  be  a  method  which  could  be 
used  regardless  of  the  forms  of  the  underlying  distribu¬ 
tions  or  any  of  their  parameters.  While  such  a  requirement 
precludes  the  use  of  any  classical  statistical  techniques, 
thera  exists  a  special  class  of  so-called  distribution-free 


techniques  which  are  applicable  in  situations  similar  to  the  above. 

The  objective  of  this  research  is  to  develop  methods  for  constructing 
a  distribution-free  set  from  a  sample  of  size  N  such  that  for  any  x 
contained  in  this  set,  it  can  be  asserted  with  a  confidence  level.  $,  that 
a  constraint  will  hold  with  a  certain  probability,  a.  The  concept  to 
be  employed  is  that  of  a  distribution-free  tolerance  region,  similar  to 
the  one  used  by  Allen  and  Braswell  [11]. 
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CHAPTER  II 


THEORY  AND  METHODS  OF  DISTRIBUTION-FREE 
TOLERANCE  REGIONS 


This  chapter  deals  with  the  development  of  the  theory 
and  methods  which  serve  as  the  statistical  foundation  for 
Chapter  III.  The  chapter  begins  with  a  definition  of  a 
distribution-free  tolerance  region,  then  follows  with  a 
general  procedure  for  constructing  such  a  region. 

Definition  of  a  Distribution-free 
Tolerance  Region 

Let  Y  *=  (Y^,...,  Yn)  be  an  n-dimensional  random  vari¬ 
able  with  a  cumulative  distribution  function  (c.d.f.) 

Hy(0*  Let  0^  =  (Yj,,  k  *  1,...,  N)  be  a  sample  of  size  N 
drawn  from  a  population  with  c.d.f.  Hy ( • ) •  Let  T  be  a 
region  that  lies  in  the  sample  space  of  Y,  and  assume  that 
the  exact  shape  and  size  of  T  depends  upon  the  observed 
values  of  0^.  Define  the  coverage,  U,  of  the  region,  T, 
as  the  probability  measure  of  T.  Since  T  is  random,  U  will 
also  be  random.  Now  if  the  corresponding  c.d.f.  of  U  is 
independent  of  Hy(-)i  and  if  for  0  <  o  <  1,  0  <  B  <  1 

Pr[U  >  a]  *  B 


6 


7 


then  T  is  called  a  100a  percent  distril  tion-free  tolerance 
region  at  a  probability  level,  6  [12].  This  concept  was 
originally  introduced  by  Shewhart  []3]  in  1931. 

The  above  definition  is  interpreted  by  Fraser  [14]  as: 
"In  repeated  sampling  the  probability  is  &  that  the  region 
T  contains  at  least  100a  percent  of  the  population."  Now 
for  a  particular  experimental  value  of  0^,  the  correspond¬ 
ing  region,  T,  may  or  may  not  contain  at  least  a  of  the 
population.  However,  one  can  assert  with  a  confidence  of 
$  that  it  does. 

It  should  be  noted  that  the  term  non-parametric  has 
also  been  used  to  describe  the  above  region  [14,15].  As 
Noether  [16]  indicates,  "this  term  has  come  to  refer  to 
methods  that  are  valid  in  some  sense  or  other  under  less 
restrictive  assumptions  than  those  of  normality  or  another 
specific  distribution  type."  The  terms  distribution- free 
and  non-parametric  are  not  always  synonymous,  however;  for 
example,  in  testing  statistical  hypotheses,  a  non-parametric 
test  is  one  which  makes  no  hypothesis  about  the  value  of  a 
parameter  in  a  statistical  density  function,  whereas  a 
distribution-free  test  is  one  which  makes  no  assumptions 
about  the  precise  form  of  the  sampled  population  [17], 
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Con  struct.!  on  o  f  a  1)  i  s  t  r  i  but ;;  o_n  -  free 
.  To) 'em  net:  Kepi  on 


One  - d 2_i : o 1 1  s  :i  on a  1  C as_e 

The  gem* m3  method  of  constructing  a  distribution-free 
tolerance  region  is  best  introduced  by  considering  the  case 
of  a  one -dimensional  random  variable  Y  with  continuous 
c.d.f.  Hy(0. 

Let  , . .  .  ,Y^  ]  be  the  order  statistics  of  a  ran¬ 

dom  sample  from  a  population  with  continuous  c.d.f. 

In  1941  Wilks  [18]  showed  that  the  symmetric  interval 
[!/.■>,  Y r . t  .  s  j  coaid  serve  as  a  distribution- free  toler- 

Cl)  (N-j  +1) J 

ance  region,  and  in  1943  Wald  [23]  derived  similar  results 
for  any  two  order  statistics.  Robbins  [19]  subsequently 
showed  that  order  statistics  alone  could  be  used  to  con¬ 
struct  a  distribution-free  tolerance  interval. 

The  Dirichlet.  distribution  is  used  in  the  construction 
of  a  distribution- free  tolerance  interval  (and  region),  so 
it  is  worthwhile  to  review  the  definition  of  this  distri¬ 
bution  and  two  of  its  properties. 

Definite]  on:  Let  (A^ , . . ,  ,  A  )  be  an  n-dim  jnsiona.1  random 
variable  with  a  probability  dens 5 1  y  function  (p.d.f.)  of 
the  form 
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,A  (x,...,xn)  - 

r  n  • 


r(vj+...+v„Al)  v,-i  v^-i 


n+1 


A,A  ...A„n  (1-A, -. . . -X_) 


P(v)V"rivP  i 


'’n.r1 


n 


n' 


,(x1,...,xn)csn 


otherwise 


where  Sn  is  the  simplex  {(A^, . . . ,An) | Aj  >  0,  i  ■ 

n 

£  A.  <  1}  in  R,  v.,  i  e  l,...,n+l  are  real  and  positive, 
i»l  1  “  n  l 

and  r(-)  denotes  the  gamma  function. 

A  distribution  having  the  above  p.d.f.  is  called  an 
n-variate  Dirichlet  distribution  and  is  denoted  by 

0<vi . v-  W- 

The  two  properties  of  a  Dirichlet  distribution  that 
will  be  used  in  this  section  are  the  following  [12]. 

Property  1:  If  (Ap...,An)  is  distributed  as  the  n-variate 
Dirichlet  D(v^,...,vn;  vn+1),  then  the  marginal  distribution 
of  (A^,...,Ak)  k  <  n,  is  the  k-variate  Dirichlet 

D(V-*V  vk+l+*  *  *+vn+l^  * 

Property  2:  If  (A^,...,An)  is  distributed  as  the  n-variate 

Dirichlet  distribution  D(v^,...,vn;  vn+1),  then  the  sum 

A,+. ..+A„  is  distributed  as  a  beta  distribution 
i  n 

B(v1+...*vn,vn+1). 

Turning  now  to  the  construction  of  a  distribution-free 
tolerance  interval,  consider  the  random  intervals 

(“**  ^(1)’  ^(2)  ^  *  •  •  • » ^  (N)  *  +40^  » 
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i  «  1,...,N+1  denote  the  corresponding  coverages  associated 
with  these  intervals.  It  can  be  shown  [12]  that  the  cover¬ 
ages  are  random  variables  having  the  N-variate 

Dirichlct  distribution  D(1 , . . . ,1 ;1) ,  which  is  completely 
symmetric  in  the  variables.  It  follows  from  symmetry  and 
Property  1  that  any  k  coverages  (k  <  N)  have  the  k-variate 
Dirichlet  distribution  D(1 , . . . ,1 ;N-k+l) ,  and  from  this  and 


Property  2  it  also  follows  that  the  sum  of  any  k  coverages 
has  the  beta  distribution  B(k,N-k+l). 


Now  for  any  two  order  statistics  Y^  y  Y^  +j,  y  the 

1  12 

coverage  Uj,  associated  with  the  random  interval  [Y^j, 

^(k  +k  *s  t^ie  sum  k2  coveraSes  an^  hence  has  the  beta 
distribution  B (k2 ,N-k2+l) .  Since  this  holds  for  any  dis¬ 


tribution,  then  with 


Pr[U,  >  a]  «  B  (2.1) 

2  “ 

for  0  <  a  <  1,  0  <  M  1,  [Y^  -j  ,  Y^,  +k  is  a  100a  per- 

1  12 

cent  distribution-free  tolerance  interval  at  probability 
level  B. 

Using  K.  Pearson's  [20]  notation  for  the  incomplete 
beta  function,  (2.1)  reduces  to 


Il-a(N“k2+1»  k25  =  B  (2-2) 

Now  for  fixed  a,  B,  k2 ,  tlicre  may  exist  no  sample  size  N 
for  which  (2.2)  holds  exactly.  However,  since  the  left-hand 
side  of  (2.2)  is  a  monotone  increasing  function  of  N,  there 


.....  ... to..,*,.,  <uuami i mi# n <w— wr«W*«i. 

11 

exists  a  smallest  integer  N  for  which 
» 

h-a^'h*1'  k2}  i  3 


For  example,  for  a  ~  .95,  6  -  .99,  ]«2  *=  128,  one  could  use 

the  tables  of  the  incomplete  beta  function  [20]  to  find 

N  “  130.  It  should  be  noted  that  Murphy  [21]  gives  graphs 

of  a  as  a  function  of  N  for  fixed  values  of  0  and  m  =  N-1<2+1 

(number  of  intervals  excluded).  Somerville  [15]  extends 

Murphy's  results  in  tabular  form. 

Scheffe  and  Tukey  [22]  extended  the  above  results  to 

the  case  where  Y  is  discontinuously  distributed  by  showing 

that  the  closed  interval  [Y^  Y^  ^]  could  serve  as 

1  12 

a  100a  percent  distribution-free  tolerance  interval  at  a 

probability  of  at  least  0,  and  the  open  interval  [Y^  y 

Y(k  +k  at  a  probability  level  of  at  most  0. 

1  2 


n-dimcnsional  Case 

Wald  [1.3]  extended  the  above  method  to  the  case  of  a 
continuously  distributed  n-dimcnsional  random  variable  Y. 

His  resulting  distribution-free  tolerance  region  consisted 
of  the  union  of  rectangular  regions  in  Rn-  In  this  section 
a  generalization  of  this  method  due  to  Tukey  [24]  is  pre¬ 
sented.  Further  generalizations,  due  to  Fraser  [14,25] 
and  Kempcrnnn  [26], do  not  concern  this  research.  The  basic 
underlying  notion  in  Tukey* s  method  is  that  of  a  "statistic¬ 
ally  equivalent  block"  which  is  the  multivariate  analogue 
of  the  intv  real  between  two  adjacent  order  statistics.  To 
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visualize  the  block  construction,  it  is  convenient  to  think 
of  the  random  sample,  0^,  as  N  points  in  Rn.  Let  <J>^(Y), 
i  »  1,...,N  be  numerical  valued  functions  with  continuous 
c.d.f. 's.  The  exact  choice  of  these  functions  will  depend 
on-  the  desired  form  cf  the  tolerance  region  to  be  constructed. 
Suppose  these  functions  are  used  to  section  Rfi  in  the  follow¬ 
ing  manner: 

First  divide  Rr  into  two  complementary  regions,  8^ 
and  S'j.such  that 

el  c  (2.3) 

by  means  of  the  cut 

where 

Wx  *  max  <J>1  (Yk)  =  4^  (Yj.  ) 

K  1 

which  defines  Y,  . 

“K1 

Let  be  divided  into  the  two  complementary  regions 
6,  and  6",,  such  that 

®2  “  5  Wl»  >  V  (2.4) 

and  by  moans  of  the  cut 

n2  *  <  Wx,  <>2(^)  =  W2) 


where 
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W2  =  max  <>2  =  *M-k  ^ 

kj^k  ^  2 

Continue  this  procedure  for  the  remaining  sample  points 
where,  in  general,  p  £N 


%  *  <  Wl’***’ Vl^  <  Wp-1*  >  wp}  (2,5) 


and 

•  > 
flp  =  flUite)  5.  wi»  •  *  *  »*p-l ^  <  Wp-1*  ♦pfe*  *  Wp} 

where 


V 


max 


l,k2 


W  =  WhJ 


P-1 


(2.6) 


The  resulting  regions  0^,...,0n,  Fn  are  the  statistic 
ally  equivalent  blocks  mentioned  above.  In  Reference  [12] 
it  is  shown  that  the  coverages  associated  with 

the  blocks  have  the  Dirichlet  distribution 

D(l, . . . ,1 ;1) .  Thus  if  U  denotes  the  coverage  of  the  sum 

m 

of  any  m  blocks,  then  is  distributed  as  a  beta  distribu 
tion  B(m,  N-m+1). 

Let  ^K-m+1  *kc  covoraSe  the  region  T^ 
by  removing  m  blocks  from  If 

Pr[Um  £  1  -  a]  *  Ij^Cm,  N-mU)  -  B 


for  some  a,f.  (0  <  ct  <  1 ,  0  <  (i  <  1),  then 


I'UU,;.,,,,)  >  Cl  -  >i.ti0n,  K-r»l)  -  B 


formed 
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and  is  a  100a  percent  distribution-free  tolerance 

region  at  probability  level  0.  It  should  be  noted  that  if 

^N-m+1  *s  ^ou'^  ky  removing  the  first  m  blocks 

then  only  the  functions  i  *  l,...,m  need  to  be 

•  ^ 

specified.  Furthermore,  the  graphs  and  tables  for  the  one- 
dimensional  case  can  be  used  to  relate  the  parameters  a, 

0,  N,  and  m  in  the  n-dimensioral  case. 

For  the  case  of  discontinuous  distributions,  0^, 
i  *  1,...,N+1  are  defined  as  above  with  the  exception  that 
(<)  is  replace,!  by  (<J  and  (>)  is  replaced  by  (>).  The 
resulting  region  becomes  a  100a  percent  distribution-free 
tolerance  region  at  a  probability  level  of  at  least  0. 

The  theoretical  justification  for  such  a  statement  can  be 
found  in  [27]. 

It  should  be  noted  that  in  dealing  with  discontinuous 
distributions,  a  situation  might  arise  in  which  two  or  more 
sample  points  minimize  a  paiticular  In  such  a  case, 

the  construction  procedure  is  no  longer  unique,  and  one 
must  specify  in  advance  a  rule  for  selecting  among  these 
alternative  points.  Tukey  [27]  suggested  such  a  rule  using 
the  concept  of  lexicographical  ordering.  (a2,...,a  )  is 
said  to  be  less  than  (b^,...,bn)  in  the  lexicographic 
sense  if  any  of  the  following  hold 

1.  aj  <  bj 

2.  •  bj,  and  a2  <  b2 


IS 


"i  “  bi'  1  <  "•  aml  an  ‘  bn 
By  defining  the  functions  <f>^ ,  $2 > • • • > ^  as 

*jC*)  :r  ❖i+i (•)>••• 

a  tie- breaking  rule  would  be  to  select  the  sample  point 
for  which  <5^  (•)  is  minimized  in  the  lexicographical  sense. 
For  example,  if  r  points  minimize  the  function  ^(*),  then 
find  the  points  among  these  r  points  that  minimize  the 
function  ^(O*  If  r7  =  1,  then  select  the  point  which 
minimizes  ^(O;  otherwise  find  the  points  among  the  r^ 
points  that  minimize  Continue  the  procedure  until 

-  1 ,  i  <  N,  or  r»  >  1.  In  the  latter  case  the  method 
of  constructing  the  sample  blocks  will  be  the  same  regard¬ 
less  of  the  point  selected  among  the  r^  points. 

In  conclusion,  this  chapter  encompasses  developments 
and  refinements  of  the  theory  and  methods  of  distribution- 
free  tolerance  regions  specifically  for  application  to 
chance-constrained  linear  programming.  Chapter  III  merges 
this  material  with  the  theory  and  methods  of  linear  pro- 
giamming  to  formulate  new  procedures  for  chance-constrained 
linear  programming  with  distribution- free  constraints. 


CHAPTER  III 


DISTRIBUTION- FREE  CONSTRAINT  SETS 

In  this  chapter  methods  are  developed  for  construct¬ 
ing  a  distribution-free  set  S(a,3)  such  that  for  any 
x  C  S(a,3)  it  can  be  asserted  with  a  preassigned  confi¬ 
dence,  g,  that  a  constraint  will  hold  at  least  100a  percent 
of  the  time.  The  required  number  of  samples  is  a  direct 
function  of  the  values  assigned  to  a  and  3. 

The  Distribution-free  Set 

The  meaning  of  a  distribution-free  set  S  is  best  ex¬ 
plained  by  considering  the  chance-constraint 

Pr [Ax  <  B]  >  a  (3.1) 

where  A  and  B  are  random  variables.  Let  be 

the  order  statistics  of  a  sample  of  s\ze  N  from  the  distri¬ 
bution  of  C  =  B/A.  If  U^,...,U^+^  are  the  coverages  asso¬ 
ciated  with  the  random  intervals  (-»,C^], 

IC(N)»+O0  and  U*  is  the  sum  of  the  coverages 
U^,...,U^+i,  there  exists  a  3  for  which  Pr[U’  ^  a]  *  3. 

The  random  interval  [C^,+«)  is  a  100a  percent  distrihu- 
tion-frcc  interval  at  probability  level  6.  Thus,  if  c^^ 
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is  the  observed  value  of  C^,  it  can  be  asserted  with  a 
confidence,  0,  that  Pr[C  >.  c^]  1  <*.  Then  if  S(a,0)  = 

(xjx  <  f°T  any  x  C  S(ct,0)  it  can  be  asserted  with 

a  confidence,  0,  that  Pr[C  ^  x]  >  «  or  Pr[Ax  B]  >_  o. 

For  the  general  case  of  q  chance -constraints  it  is 
desired  to  find  q  distribution-free  sets  with  the  above 
property.  That  is,  for  any  x  C  S1  it  can  be  asserted  with 
a  confidence  of  at  leas':  0^  that  Pr[A^  x  £  B^]  a^. 

Maximization  of  the  objective  function  would  then  be  over 
the  intersection  of  these  q  sets.  The  next  section  de¬ 
scribes  the  fundamental  approach  to  be  taken  in  this  re¬ 
search  for  constructing  a  distribution-free  set  for  a 
particular  constraint.  For  convenience  the  superscript, 
i,  is  omitted,  and  the  right-hand  side,  B,  fixed  at  one. 

When  B  is  random,  the  procedures  which  follow  are  applicable 
to  the  random  vector  A/B, 

Constructing  a  Distribution-free  Set 

The  approach  for  constructing  a  distribution-free  set 
can  be  described  in  two  basic  steps. 

1.  Construct  a  100a  percent  distribution-free  toler¬ 
ance  region  with  confidence,  0,  from  samples  of  the  elements 
of  A.  Denote  this  region  by  T(a,0). 

2.  Determine  the  set  S(a,0)  such  that  for  any 

x  C  S(a,0),  A  x  <  1  VA  C  T(a,0).  The  justification  for 
taking  these  steps  is  that  for  any  x  C  S(a,0)  the  half-space 
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{A|A  x  <  1}  contains  a  100a  percent  distribution-free 
tolerance  region  with  confidence  8.  Hence,  for  any 
x  CS(c*,B)  it  can  be  asserted  with  a  confidence  of  at 
least  8  that  Pr[A  x.  £  1]  >_  a. 

To  illustrate  the  construction  of  a  deterministic  set 
S(a,B)  from  a  distribution-free  tolerance  region,  let  such 
a  region  be  constructed  by  removing  a  statistical  block 
with  the  linear  cutting  function  <J>  =  A  The  elements 
of  £  are  assumed  to  be  arbitrarily  chosen  and  constant. 
This  region  is  given  by 

T(a,B)  «  (A| A  £*  <  1} 

where 

=  y3/W  j  ~  1 . n 

and 

W  -  max  <KA.  )  k  «=  1, . . .  ,N 

k  ~K 

The  desired  set  corresponding  to  this  region  would  then  be 
given  by 

S(a, 8)  -  (xjx  s  0  <  X  <  1,  x  >  0} 

Example  3.1  further  illustrates  this  approach. 

Example  3.1 

Suppose  it  is  desired  to  find  a  set  S(.5,  .95)  such 
that  for  any  x  =  (Xj,  x2)  C  S(.5,  .95)  it  can  be  asserted 
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with  a  confidence  of  at  least  .95  that 
* 

PrUjX^  +  A2x2  <  1]  >  .50 

Choosing  y^  =  1  and  y2  »  2  (arbitrarily),  the  function 
$  =  +  2A2  is  used  to  generate  a  tolerance  region  T^(.5, 

.95).  The  required  sample  size  is  then  the  smallest 
integer  value  N  satisfying  the  relationship 

I  $(1,  N)  >  1  -  .95 

which  can  be  shown  to  be  N  =  5. 

To  illustrate,  a  random  sample  of  size  N  »  5  was  taken 
from  a  population  of  independently  and  identically  distrib¬ 
uted  normal  variates  with  means  and  variances  of  3  and  1, 
respectively.  The  resulting  sample  values  are  (A^  A2  ^) 
*  (3.485,  2.618);  (Aj  2,  A£  2)  =  (4.345,  1.398);  (Aj  3, 

A2  3)  «  (.538,  1.534);  (Aj.  4,  A£  4)  =  (3.043,  .361);  and 
(Ai  51  a2  5)  =  (2.084,  3.598).  Then  W  =  max(A^  +  2A2^)  = 
2.084  +  2(3.598)  =  9.280,  and  T(.5,  .95)  «  (A^  A2 | . 108  Ax 
+  .216  A2  <  1). 

The  corresponding  distribution-free  set  is  then 
S(. 5,  .95)  «  (x1,x2| (x1,x2)  =  AC-108,  .216), 

°  <  X  <  1,  (xx-,x2)  >  0} 

A  scatter  diagram  of  the  original  sample  points  is  given 
in  Figure  3.1,  along  with  the  tolerance  region  T(.5,  .95). 
Figure  3.2  contains  a  graphical  representation  of  the  set 
S( . 5 ,  .95). 
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The  foregoing  approach  was  used  to  illustrate  the 
notioi  of  constructing  a  deterministic  set  S  from  a  distri¬ 
bution-free  tolerance  region  T.  The  relative  merit  of  such 
an  approach  is  dubious  when  dealing  with  more  than  one  con¬ 
straint.  since  the  choice  of  x  is  restricted  to  points 
along  a  vector  in  .  The  same  cutting  function  must  be 
used  for  each  constraint,  otherwise  the  only  choice  for  x 
would  be  the  origin.  The  two  methods  which  follow  provide 
considerably  more  freedom  in  the  choice  of  the  shape  of  the 
distribution-free  regions  corresponding  to  each  constraint. 

A  Distribution-free  Linear  Constraint  Set 

It  is  possible  to  represent  a  distribution-free  set 
as  a  linear  constraint  set  in  the  following  manner.  First 
construct  a  distribution-free  tolerance  region  TL(a,B) 
using  a  sequence  of  cutting  functions  of  the  form 

^ j  _  j  s  1» • •  •  »n 

The  resulting  region  would  be  given  by 

TL(a,B)  =  {A| A  <  W)  (3.2) 

where  the  elements  of  W  =  (W^ W  )  are  determined  from 
Eqs.  (2.3)  and  (2.6)  of  the  previous  chapter.  The  desired 
set  is  then  given  by 

SL(a,0)  ®  (x | W  x  <  1,  x  >  0} 


(3.3) 
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as  is  evidenced  by  the  following  theorem. 

§ 

Theorem  3.1 

Let  SL  and  be  given  by  Eqs.  (3.2)  and  (3.3),  respec¬ 
tively.  (For  convenience  the  (a, 3)  designation  is  deleted.) 
Then  a  necessary  and  sufficient  condition  for  A  x  <  1  ¥A  C  T^ 
is  that  x  C  SL. 

Proof 

(Sufficient) 

Assume  x  CS^ 

Show 

A  x  <  1  V;  A'  C  Tl  (3.4) 

Rewrite  Eq.  (3.4)  as 

W  x  -  i  <  (W  -  A)  x  (3.5) 

Novr  the  r.h.s.  of  Eq.  (3.5)  is  always  greater  than  or  equal 
to  zero  VA  C  T^,  so  the  inequality  will  always  hold  pro¬ 
vided  the  l.h.s.  is  non-positive. 

Since  by  definition  of  the  set 

W  x  -  1  <  0 

the  sufficiency  part  of  the  proof  is  complete.  The  neces¬ 
sary  part  ox  the  proof  follows  since 

A  x  £  1  VA  CTl 
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and  in  particular,  the  relationship 
W  x  <  i 

must  be  satisfied. 

Example  3.2 

For  comparative  purposes  the  problem  stated  in  Example 
3.1  will  be  used,  along  with  the  same  five  sample  values. 
However,  before  proceeding  it  is  necessary  to  increase  the 
sample  size  to  N  *  8,  since  this  is  the  smallest  number  for 
which 

X  5 (2 ,  N-l)  >  1  -  .95 

The  additional  simulated  sample  values  are  found  to  be 
(2.502,  2.972),  (2  341,  2.143)  and  (3.456,  4.116). 

Figure  3.3  contains  a  scatter  diagram  of  the  eight 
sample  values  along  with  the  resulting  tolerance  region 

Tl(.5,  .95)  «  (A1,A2|A1  <  4.345,  A2  <  4.116} 

The  desired  linear  set  (shown  in  Figure  3.4)  is  then  given 
by 

SL(.S,  .95)  =  {Xj,x2|4.345  x^  +  4.116  x2  ±  1,  (x^,x2)  >_  0} 

A  major  disadvantage  of  the  above  approach  is  that  as 
the  number  of  variables,  n,  increases,  so  does  the  number 
of  required  cuts,  m.  This  in  turn  requires  a  larger  sample 
size  N  for  fixed  levels  of  a  and  3.  As  is  shown  in  Table  3.1, 
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with  a  =  .90  and  $  =  .95,  the  size  of  N  for  even  modest 

values  of  m  is  quite  large.  For  the  case  of  limited  or 

costly  data,  this  restriction  could  be  very  significant. 
•  • 

Table  3.1 

Values  of  m  and  N  with  a  =  .90;  $  =  .95 


Li 

N 

5 

90 

10 

155 

15 

215 

20 

275 

2S 

335 

30 

390 

40 

500 

50 

600 

The  next  section  shows  how  a  spherical  cutting  function 
can  be  used  to  construct  a  convex  distribution-free  con¬ 
straint  set  without  the  above  restriction. 


A  Distribution- free  Convex  Constraint  Set 


Suppose  a  distribution- free  tolerance  region  is  con¬ 
structed  via  the  cutting*  function 


n 

I 

U-1 


❖(A)  -  | A  -  d| 


1/2 


26 


where  d  is  a  row  vector  of  proassigned  constants.  The  re 
suiting  tolerance  region  is  given  by 

Ts(a.B)  *  (A)  )A  -  d)  <  p)  6- 

where 


P  a  max  (A,  -  dj 
k  “k  - 


This  region  is  the  surface  and  interior  of  an  n-dimensional 
hypersphere  centered  at  d  with  radius  p.  The  corresponding 
distribution-free  set  is  given  by 


Ss(a,3)  =  {x|  J x |  <  (1  -  d  x)/p,  x  >  0} 
Theorem  3.2 


het  Ts  be  given  by  Eq.  (3.6).  A  necessary  and  suffi- 
cient  condition  for  A  x  <  1.  va  C  T  is 

1*1  <  (1  -  d  x)/p 

(3.7) 


Proof 

(Sufficient) 

Assume  Eq.  (3.7)  holds.  Show  that  Ax  <  1  Va  CT 
Ax«dx  +  (A-d)  x 

1  i  s  +  jA  -  d|  | x | 

lis  M(1  *  d  x)/p)  p 


<  1 
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(Necessary) 

Assume  A  x  <_  1  VA  C  Tg.  Show  that  Eq.  (3.7)  holds. 

A  x  <1 

d  x  +  (A  -  d)  x  <  1 
(A-d)x£l-dx 

| A  -  d|  | x |  cos (A  -  d ,  x)  £  1  -  d  x  (3.8) 

Note  that  if  Eq.  (3.8)  holds  for  points  on  the  surface  of 
the  hypersphere  defined  by  Tg,  then  this  relationship  also 
holds  for  all  points  contained  in  this  hypersphere.  Thus 
| A  -  d|  can  be  replaced  with  p  in  Eq.  (3.8)  to  give 

|xj  cos(A  -  d,  x)  <_  (1  -  x) /p  (3.9) 

For  Eq.  (3.9)  to  hold,  it  must  hold  for  an  A*  on  the  sur¬ 
face  of  Tg  for  which  cos (A*  -  d,  x)  «  1.  Such  a  point 
exists  and  is  given  by  A*  -  [(px)/|x|]  +  d.  Replacing 
cos  (A  -  d,  x)  with  cos  (A*  -  d,  x)  =  1  in  Eq.  (3.9)  yields 

|x|  1  C1  -  £  *)/p 

and  the  end  of  the  necessary  part  of  the  proof. 

The  convexity  of  the  set  Sg  can  be  proven  by  letting 
x*  *  AXj  +  (1  -  A)  x2  where  0  <  X  <  1  and  xA  ,x2  C  Sg.  Then 

|x*J  -  (1  -  d  x*)/p  = 

JXxj  +  (1  -  A)  x2|  -  [I  -  (d(Axx  ♦(!  *  A)  x2))/p 
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<  XjxjJ  +  (1  -X)|x2f  -  [1  *  Xd  -  (1  -  X)  d  x2)/p 

# 

<|X(1  ■  dxp  +  (1  ■  H)(l  *d  x2)  -  U  -X  d  x x 

•'  -  Cl  -  X)  d  x2][/p 
# 

<  \  +  (1  -  X)  -  1 

<  0 

The  relationship  (3.7)  can  be  described  geometrically 
as  the  surface  and  interior  of  a  sphere,  elipsoid,  parabo¬ 
loid  or  one  nappe  of  a  hyperboloid  depending  on  6  (i.e., 

6  =  p,  <0,  «  0  or  >0)  where  5  *  p  -  |d|.  This  is  illus¬ 
trated  in  Figures  3.5  through  3.8,  where  d  varies  and  p 
remains  fixed  at  .5.  Also  included  in  these  figures  are 
the  corresponding  tolerance  regions  described  by  Eq.  (3.6). 

Example  3.3  illustrates  the  foregoing  method  with 
respect  to  the  preceding  examples. 

Example  3.3 

For  convenience,  the  circular  cutting  function  with 

d  =  (0,0)  is  considered.  Since  only  one  cut  is  required, 

the  original  five  sample  points  are  used  to  determine  the 
2  2  1/2 

value  of  p  =  (A^2  +  A22)  *  4.5.  Then  the  resulting 

tolerance  region  (shown  in  Figure  3.4)  is  given  by 

TS(.S,  .95)  -  {(A1(A2)|(Aj  *  A2)1/2  <  4.5) 

and  the  corresponding  distribution-free  set  (shown  in 
Figure  3.5)  is  given  by 
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SS(.S,  .95)  •  {Xl,x2|(xJ  *  x*)  <  1/4.5,  (Xj.Xj)  »  (0,0)) 

• 

The  tolerance  regions  and  distribution-free  sets  of 
the  foregoing  examples  are  shown  together  in  Figures  3.9 
and  3.10,  respectively. 

Expanding  the  Size  of  a  Distribution-free  Set 

This  section  is  concerned  with  the  problem  of  expand¬ 
ing  the  size  of  a  distribution-free  set,  S(o,3),  after  it 
has  been  constructed  from  a  sample  of  size  N.  Such  an 
expansion  might  be  motivated  by  an  undesirable  value  of  the 
objective  function  obtained  by  maximizing  over  S(a,B).  If 
this  occurs  in  the  use  of  the  Quantile  or  Tchebysheff 
Methods,  the  sets  can  be  expanded  by  reducing  the  pre¬ 
assigned  probability  level,  a,  for  constraint  satisfaction. 
This  results  in  a  smaller  value  of  Ka  or  (a/1  -  a)  and  thus 
increases  the  size  of  the  respective  sets  Sq,  St  as  de¬ 
scribed  by  Eqs.  (1.3)  and  (1.4)  in  Chapter  I.  In  the  case 
of  a  distribution-free  set  the  problem  could  be  similarly 
resolved  by  reassigning  lower  levels  of  a  and  B  and  repeat¬ 
ing  the  construction  procedure  with  reduced  sample  sizes. 

If  there  are  no  samples  available,  then  it  may  be  possible 
to  obtain  a  larger  set  by  reducing  the  original  tolerance 
region  T(ct,B)  by  taking  additional  cuts.  The  coverage  of 
the  resulting  region  is  described  in  the  following  theorem. 
(The  proof  of  theorem  3.3  is  presented  in  the  Appendix.) 


Figure  3.10.  T^(.5,  .95)  for  Example  3.3 
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Theorem  3.3 

Let  Um  be  the  coverage  of  the  region  T(a,B)  constructed 
from  a  sample  of  size  N  by  removing  m  blocks.  Let  Ura+n  be 
the  coverage  of  the  region  T,(a,,fS')  formed  by  removing  m* 
additional  blocks  from  T(a,B).  Then 

Pr[Um+m>  >  a’]  -  1  -  ^./um  (N-m-m’+l,  m')  *  3’  (3.8) 

From  the  above  theorem  it  is  seen  that  the  confidence 
level,  0 * ,  associated  with  the  region  T(a',8')  is  dependent 
upon  the  coverage,  Um,  of  the  original  region,  T(o,B). 

Once  the  sample  has  been  drawn,  Um  is  a  fixed  but  unknown 
quantity.  Thus,  it  is  impossible  to  determine  the  value 
of  0'  for  a  given  level  of  a*.  However,  relationship  (3.8) 
can  be  used  to  approximate  the  coverage,  Um+m  ,  by  replacing 
Um  with  a  suitable  estimate.  One  such  estimate  is  the 
original  value  of  o,  since  it  is  known  with  a  confidence 
of  at  least  3  that  Um  >  a. 


CHAPTER  IV 


EXPERIMENTATION  AND  COMPUTATIONAL  RESULTS 


This  chapter  includes  the  results  of  investigations 
into  the  performance  of  linear  and  spherical  distribution- 
free  constraint  sets  using  simulated  data  from  a  non-normal 
distribution.  The  value  of  such  investigations  is  two-fold. 
First,  it  provides  a  clearer  understanding  of  the  meaning 
and  interrelationship  of  the  parameters  a  and  3.  Second, 
it  provides  a  means  of  comparing  the  relative  merit  of  a 
distribution-free  set  versus  one  obtained  using  the  Quantile 
or  Tchebysheff  Method  in  the  absence  of  any  knowledge  of  the 
underlying  distribution. 

Notation  and  Assumptions 

Consider  the  single  chance-constraint 

Pr[A  *  1  1]  1  a»  x  >  0  (4.1) 

Let  Sg(a,3)  denote  a  convex  distribution-free  set  as  de¬ 
scribed  in  the  previous  chapter.  All  x  C  Sg(a,8)  will 
satisfy  (4.1)  with  a  confidence  of  at  least  3.  Let  SL(a,S) 
denote  a  linear  distribution- free  set  with  the  same  property. 
ST(a)  and  Sq(«)  denote  sets  obtained  by  the  Tchebysheff  and 
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Quantile  Methods.  The  information  required  to  construct 

the  above  sets  is  based  upon  sample  data  from  independent 

and  identically  distributed  gamma  variates  with  parameters 

p  s  10  and  v  =  S.  These  variates  were  generated  on  an  IBM 
* 

360-65  computer  using  the  FORTRAN  program  suggested  in 
reference  [28]. 


Construction  of  Sg(a,B),  SL(a,B), 


Consider  the  case  of  n  =  2,  and  suppose  it  is  desired 
to  find  a  region  in  the  positive  quadrant  of  x  *  (x1,x2) 
such  that  any  point  in  this  region  will  satisfy  the  chance  - 
constraint 

Pr^  xx  +  x2  5.  ’1  1  *9° 

Such  a  region  can  be  determined  using  a  spherical  cut¬ 
ting  function  with  a  confidence  of  .95  from  a  sample  of 
size  N  =  29.  For  the  purpose  of  generality  this  region  is 
constructed  using  a  spherical  cutting  function  with  £  =  (0>0) 
Table  4.1  contains  the  29  simulated  sample  points  CA^.A^, 
k*l,  . . .  y29).  The  sixth  sample  value  (.706,  .734)  yields 
the  maximum  value  of  p  *  (.706) 2  +  (.734) 2  *  1.037.  The 
resulting  distribution- free  set  is 

Ss(.90,  .95)  *  ixltxzUxl  *  x2)1/2  <  (1/1.037)172) 
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Tabic  4.1 

*29  Simulated  Samples  of  (A^,A2) 


k 

Alk 

A2k 

k 

Alk 

A2k 

k 

Aik 

A2k 

1 

.493 

.380 

11 

.288 

.383 

21 

.492 

.527 

2 

.773 

.405 

12 

.615 

.365 

22 

.623 

.777 

3 

.490 

.382 

13 

.293 

.303 

23 

.252 

.565 

4 

.384 

.472 

14 

.358 

.450 

24 

.405 

.398 

5 

.277 

.456 

15 

.651 

.525 

25 

.718 

.408 

6 

.706 

.734 

16 

.685 

.412 

26 

.484 

.215 

7 

.635 

.416 

17 

.421 

.070 

27 

.890 

.399 

8 

.446 

.670 

18 

.489 

.630 

28 

.317 

.725 

9 

.172 

.122 

19 

.650 

.458 

29 

.232 

.329 

10 

.366 

.625 

20 

.419 

.592 

-- 

To  illustrate  empirically  the  meaning  of  a  =  .90  and 
$  ■  .95,  the  above  procedure  for  constructing  Sg(.90,  .95) 
was  repeated  for  99  additional  sample  sets  of  size  N  =  29. 
For  each  set,  the  surface  point  x*  for  which  =  x2  was 
selected  and  in  1,000  realizations  of  the  random  variables 
A1»A2»  t^le  number  of  times  that  the  relationship  A^  + 
A2x$  ~  1  was  satisfied  was  recorded,  and  denoted  as  ALPHA. 
In  BETA  =  96  of  the  100  trials,  the  value  of  ALPHA  was 
found  to  be  greater  or  equal  to  900. 

Table  4,2  exhibits  these  values  along  with  other  ob¬ 
served  values  of  BETA  for  various  values  of  ALPHA.  These 
observations  can  be  compared  with  actual  values  of  a  and  6 
with  N  «  29  and  m  ®  1  in  Table  4.3. 
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Table  4.2 

t 

Observed  Values  of  ALPHA  and  BETA 


ALPHA 

BETA 

800 

99 

850 

99 

900 

96 

950 

76 

960 

70 

97C 

59 

Table 

Actual  Values 
with  N  «  29 

4.3 

of  a  and  B 
and  m  **  l 

c; 

8 

.800 

.998 

.850 

.992 

.300 

.953 

.950 

.775 

.960 

.697 

.970 

.585 

A  linear  distribution- free  set  S^(.90,  *95)  would  re¬ 
quire  additional  sample  points  since  two  cuts  are  needed 
(as  opposed  to  one  in  the  spherical  case).  Rather  than 
taking  any  more  samples,  the  set  S^C.90,  .83)  is  constructed 
from  the  original  sample  of  size  29.  The  maximum  value  of 
^lk  *s  by  A^  27  s  *890.  The  maximum  value  of  A2  ^ 

(after  deleting  A£  27)  is  A2  22  =  .777.  The  resulting 
linear  set  is  given  by 

Sl(.90,  .83)  (xltx2| .890  xx  +  .777  x£  ±  1) 

To  construct  the  set  ST(.90)  using  the  Tchebysheff 
Method,  it  is  necessary  to  calculate  sample  means  and  vari¬ 
ances  from  the  29  sample  values  of  Table  4.1.  The  result¬ 
ing  set  is  given  by 

ST(.90)  =  {xltx21.484  Xj  +  .455  x2  +  3.0(.033  x2 
+  .028  x2)1/2  <  0) 

Assuming  (erroneously)  and  to  be  independent  normal 
variates,  the  Quantile  Method  could  be  used  to  generate 
the  set 

Sq(.90)  =  (x1,x2|.484  x1  +  .455  x2  +  1.282(.033  x2 
+  .028  x2)1'2  <  0} 
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Comparative  Analysis 

» 

To  illustrate  geometrically  the  relative  accuracy  of 
the  above  sets  with  respect  to  the  true  set  S  = 

{x1,x2|Pr[A1  x1  +  A2  x2  <  1]  >  .90}  the  boundary  of  this 
set  was  approximated  in  the  following  manner.  For  a  fixed 
value  of  x1,  the  value  of  x2  was  incremented  in  units  of 
.02  until  such  time  that  |ALPHA-900|  <5.  The  procedure 
was  then  repeated  for  incremental  (.2)  values  of  x^.  The 
resulting  values  of  x^,x2,  and  ALPHA  are  presented  in  Table 
4.4. 


Table  4.4 

Approximate  Boundary  Points 
of  Actual  Region  S 


5i 

—2 

ALPHA 

0 

1.16 

898 

.2 

1.06 

903 

.4 

.96 

895 

.6 

.80 

904 

.8 

.58 

901 

1 

.32 

896 

These  points  were  used  to  approximate  the  true  region 
with  the  region  S  shown  in  Figure  4.1,  which  also  contains 
the  sets  Sg ( . 90 ,  .95),  SL(.90,  .83),  ST(.90)  and  SQ(.90). 
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.1  .2  .3  .4  .5  .6  .7  .8  9  1.0  l.l  1.2  1.3  1.4 


Figure  4.1.  Comparison  of  chance-const  rained  sets  with 
no  knowledge  of  the  underlying  distribution. 
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The  following  observations  are  made  from  this  figure. 

A.  The  sets  Sg,  S^,  S^,  are  conservative  with  respect 
to  the  degree  in  which  boundary  points  satisfy 
the  constraint  more  than  90  percent  of  the  time. 

B.  The  set  Sj  obtained  via  the  Tchebysheff  Method  is 
the  most  conservative. 

C.  The  set  Sq  obtained  via  the  Quantile  Method  yields 
a  considerably  larger  region,  but  points  alrng  the 
boundary  will  violate  the  constraint  more  than 

10  percent  of  the  time. 

D.  The  boundaries  of  the  sets  Sq  and  S^.  follow  the 
shape  of  the  true  boundary  more  closely  than  those 
of  either  Sg  or  SL> 

The  first  three  observations  are  illustrated  numerically 
by  considering  various  points  along  the  respective  bounda¬ 
ries  and  checking  the  constraint  satisfaction  with  samples 
of  size  N  =  1,000.  In  particular,  the  points  considered 
are  those  which  maximize  the  value  of 

2  =  C1  X1  +  c2  x2 

for  values  of  £  =  (1,1),  (2,1)  and  (4,1).  The  resulting 
values  xltX2  and  corresponding  values  of  ALPHA  for  each  set 
arc  presented  in  Table  4.5. 

To  determine  if  the  observations  A,  B  and  C  could  be 
made  for  higher  dimensions,  a  similar  analysis  is  first  per¬ 
formed  for  the  case  of  n  =  10.  The  boundary  points  considered 
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Table  4.5 

4 

Empirical  Constraint  Satisfaction 


c 

Set 

X1 

x2 

ALPHA 

(1,1) 

ss 

.69 

.69 

905 

o 

o 

• 

1.24 

892 

ST 

.54 

.66 

962 

SQ 

.70 

.92 

781 

(2,1) 

ss 

.88 

.42 

917 

SL 

1.12 

.00 

926 

sT 

.92 

.12 

976 

SQ 

1.40 

.00 

826 

(4,1) 

ss 

.96 

.24 

922 

SL 

1.12 

.00 

926 

ST 

.97 

.00 

973 

SQ 

1.40 

.00 

826 

10 

are  those  which  maximize  the  value  of  z  =  E  x..  The 

j=l  5 

solutions  are  obtained  using  the  Sequential  Unconstrained 
Minimization  Technique  (SU  T)  developed  by  Fiacco  and 
McCormick  [29].  The  resulting  values  of  z  are  presented 
in  Table  4.6  along  with  the  corresponding  values  of  ALPHA. 
These  results  relate  to  A,  B  and  C  in  the  following  manner. 
A’.  The  sets  Sg  and  S^,  are  still  conservative,  but 
it  is  now  possible  to  generate  a  set  S^  which 
can  contain  points  violating  the  constraint  more 
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I  Table  4.6 

Values  of  ALPHA  and  z  with  n  =  10 


l 


l 


Run 

ss 

ALPHA 

st 

ST 

SQ 

1 

982 

874 

984 

764 

2 

988 

919 

994 

1,000 

3 

984 

920 

994 

913 

4 

991 

774 

999 

912 

5 

977 

900 

989 

841 

6 

979 

943 

999 

918 

7 

988 

883 

987 

886 

8 

995 

824 

999 

958 

9 

995 

908 

991 

968 

10 

989 

882 

995 

954 

z 

•  • 

Run 

ss 

SL 

ST 

SQ 

1 

1.526 

1.309 

1.494 

1.793 

2 

1.473 

1.132 

1.391 

1.119 

3 

1.513 

1.178 

1.411 

1.652 

4 

1.327 

1.541 

1.312 

1.655 

5 

1.550 

1.251 

1.458 

1.710 

6 

1.539 

1.1.46 

1.348 

1.664 

7 

1.479 

1.275 

1.463 

1.675 

8 

1.418 

1.425 

1.324 

1.569 

9 

1.419 

1.204 

1.441 

1.518 

10 

1.468 

1.292 

1.412 

1.567 

di 
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than  the  preassigned  level  of  1-a  =  .10.  [This 
is  to  be  expected,  since  the  level  of  confidence 
is  only  equal  to  .001  (see  Table  3.1).] 

B*.  The  set  is  still  more  conservative  in  the 

majority  of  the  trials,  but  not  substantially  so 
when  compared  with  the  set  Sg. 

C*.  Depending  upon  the  particular  sample  values  drawn, 
the  corresponding  set  Sq  may  or  may  not  contain 
points  on  the  boundary  which  violate  the  constraint 
more  than  10  percent  of  the  time. 

The  above  observations  are  further  supported  for  the 
case  of  n  «*  25,  as  shown  by  the  results  presented  for  this 
case  in  Table  4.7. 

In  observation  D,  the  boundaries  of  the  sets  and 
were  much  more  representative  of  the  shape  of  the  true 
boundary. 

The  distribution-free  boundaries  were  not  nearly  as 
representative,  since  even  before  the  samples  were  drawn  it 
was  known  that  the  resulting  sets  Sg  and  would  be  cir¬ 
cular  and  linear,  respectively.  Although  this  will  alw’ays 
be  the  case  for  the  latter  set,  it  need  not  be  for  the 
former  set  because  the  shape  of  this  set  can  be  controlled 
by  the  choice  of  the  vector  cL  There  is  an  infinite  number 
of  choices  for  the  values  of  the  elements  in  this  vector, 
and  there  is  no  way  of  telling  prior  to  sampling  which 
choice  yields  a  more  representative  shape  of  the  true 


Table  4.7 


Values  of  ALPHA  and  z  with  n  *>  25 


0 

Run 

S. 

o 

ALPHA 

SL 

ST 

SQ 

1 

995 

502 

1,000 

859 

2 

972 

797 

990 

930 

3 

992 

440 

987 

859 

4 

1,000 

795 

989 

865 

5 

996 

904 

999 

885 

6 

998 

696 

989 

832 

7 

961 

777 

967 

900 

8 

996 

662 

982 

911 

9 

993 

756 

988 

931 

10 

991 

883 

992 

956 

z 


Run 

ss 

SL 

ST 

SQ 

1 

1.582 

2.103 

1.411 

1.819 

2 

1.672 

1.469 

1.577 

1.726 

3 

1.614 

2.272 

1.594 

1.829 

4 

1.430 

1.471 

1.616 

1.828 

5 

1.569 

1.234 

1.477 

1.753 

6 

1.457 

1.605 

1.831 

7 

1.516 

1.652 

1.778 

8 

1.573 

1.745 

1.535 

1.781 

9 

1.598 

1.583 

1.748 

10 

1.625 

1.299 

1.480 

1.726 

boundary.  It  should  be  noted,  however,  that  the  use  of  the 
* 

true  mean  values  has  worked  exceptionally  well.  That  is  to 
say  that  the  set  generated  by  the  cutting  function 

<f>  «  {  CA^ , A2 )  -  (.5,  .5)|  follows  the  shape  of  the  true 
boundary  of  S’.  This  is  illustrated  in  Figure  4.2.  This 
figure  also  contains  the  sets  Sg,  S  and  S’^,  where  Sg  and 
S  are  as  in  Figure  4.2  and  S^.  is  a  set  obtained  from  the 
Tchebysheff  method  using  actual  means  and  variances.  From 
Figure  4.2  it  is  seen  that  while  the  set  gives  a  better 
approximation  of  the  shape  of  the  actual  region,  it  is  also 
more  conservative  than  the  set  Sg.  The  set  S^.  still  remains 
the  most  conservative. 


CHAPTER  V 


CONCLUSIONS  AND  EXTENSIONS 

In  this  research  methods  were  developed  to  deal  with  the  chance- 
constrained  set,  S  =  {xJPr[A  x  <_  B]  >.a},  when  any  information  con¬ 
cerning  the  random  variables  A-j,...,A  and  B  must  be  derived  from 
actual  samples.  When  existing  techniques  are  employed,  i  j 
is  not  possible  to  relate  the  accuracy  of  sample  informa¬ 
tion  to  actual  constraint  satisfaction.  The  distribution- 
free  methods  which  were  developed  as  a  result  of  this  re¬ 
search  alleviate  the  problem  by  providing  a  lower  bound  on 
the  confidence  3,  that  one  can  associate  with  a  value  of 
x  satisfying  the  chance -constraint  at  the  preassigned 
probability  level,  a.  The  sample  size,  N,  required  to 
meet  the  desired  confidence  is  readily  available  in  tabular 
or  graphical  form. 

Two  methods  of  approximating  the  set  S  were  developed 
using  the  theory  of  distribute  on- free  tolerance  regions. 

The  resulting  sets,  S^(a,B)  and  Sg(a,0),  have  the  property 
that  any  x  contained  in  them  satisfies  the  chance-constraint, 
Pr [A  x  £  B]  >_  a,  with  levels  of  confidence  and  Bg.  The 
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advantage  of  the  set  S^(a,3)  is  that  it  is  a  linear  con¬ 
straint  with  exactly  the  same  number  of  coefficients  of  the 
original  constraint.  Furthermore,  the  values  for  these  co¬ 
efficients  can  be  determined  directly  by  inspection  of  the 
random  samples.  The  disadvantage  of  the  set  S^(a,3)  is 
that  for  fixed  levels  of  a.  and  3,  the  required  sample  size 
increases  rapidly  as  n,  the  dimension  of  A  *  (A^,...,An), 
increases.  The  convex  set  Sg(a,3),  on  the  otrer  hand,  does 
not  possess  this  functional  relationship  between  N  and  n. 
Another  advantage  is  the  flexibility  which  is  provided  for 
choosing  the  general  shape  of  the  resulting  distribution- 
free  set. 

.The  superiority  of  the  set  Sg(a,3)  over  the  sets 
Sq(a)  and  Sy(a)  obtained  via  the  Quantile  and  Tchebysheff 
Methods  was  demonstrated  using  simulated  gamma  variates. 

The  Quantile  Method,  with  normal  variates,  is  superior 
since  the  set  Sq(o)  is  equivalent  to  the  desired  set,  S, 
whereas  the  sets  Sg(cn,3)  and  Sy  are  only  small  subsets  of 
S.  However,  when  the  normality  assumption  does  not  hold,  it  is 
possible  for  the  set  Sq(oO  to  contain  points  which  do  not 
satisfy  the  constraint  at  the  desired  level,  a,  as  demon¬ 
strated  in  Chapter  IV,  Thus,  before  employing  the  Quantile 
Method,  it  is  essential  that  the  normality  assumption  be 
carefully  checked.  If  it  is  found  that  the  underlying  dis¬ 
tribution  is  definitely  non-normal,  then  a  distribution- 
free  approach  should  be  considered  over  the  Tchebysheff 
Method  for  the  following  reasons. 
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1.  It  provides  a  way  of  measuring  effect  of  the  sample 

0 

size,  N,  upon  the  confidence,  B,  associated  with  attainment 
of  the  desired  probability  level,  a.  With  the  Tchebysheff 
Method,  it  is  difficult  to  decide  on  an  appropriate  sample 
size  to  estimate  the  required  parameters. 

2.  The  results  of  Chapter  IV  indicate  that  the  set 
Sg(a,B)  is  not  as  conservative  as  S^Ca),  even  for  the  rela¬ 
tively  high  level  of  confidence  level  of  B  -  95.  This 
means  that  if  points  In  ST(a)  are  expected  to  satisfy 
the  constraint  with  a  probability  of  at  least  a,  they  actu¬ 
ally  satisfy  them  at  least  100a,p  percent  of  the  time,  where 
Oj,  »  a.  The  corresponding  value  for  the  set  Sg(a,B)  is 
closer  to  the  desired  level,  a.  This  can  be  seen  in  Chapter 
IV  by  comparing  the  value  of  ALPHA  obtained  using  the  above 
methods . 

.  Although  the  empirical  results  of  Chapter  IV  were  based 
upon  independently  distributed  random  variates,  the  pro¬ 
cedure  for  constructing  the  sets  Sg(a,B)  and  SL(a,B)  for 
dependent  variates  is  the  same.  This  is  not  the  case  foi 
the  set  ST(a),  which  requires  estimates  of  the  covariances. 

It  could  be  argued  that  the  Tchebysheff  Method  is  superior 
tc  a  distribution- free  method  on  the  grounds  that  the 
former  is  able  to  take  advantage  of  more  information  regard¬ 
ing  the  interdependence  of  the  random  variables  in  question. 
In  real-world  situations,  however,  estimation  vf  covariances 
is  much  more  difficult  than  that  of  means  and  variances,  and 
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the  problem  of  assessing  the  effect  of  bad  estimates  upon 
the  set  Sj(cO  is  made  considerably  more  difficult. 

The  simulated  random  variables  in  Chapter  IV  were  con¬ 
tinuously  distributed.  Had  discrete  variates  been  used, 
the  only  deviation  from  the  method  constructing  distribu¬ 
tion-free  sets  would  have  arisen  in  the  case  of  ties  ; 
that  is*  two  or  more  sample  points  would  yield  the  same 
maximum  value  of  the  particular  cutting  function  employed. 

In  such  a  case,  the  ties  could  be  broken  using  lexicograph¬ 
ical  ordering  rules  as  discussed  in  Chapter  II.  It  should 
be  noted  that  the  values  of  o  and  B  do  not  depend  upon  the 
continuity  of  the  variables  in  question. 

The  problem  of  increasing  the  size  of  a  distribution- 
free  set  was  investigated.  With  the  Quantile  or  Tchebysheff 
Methods,  the  size  of  the  chance-constrained  set  can  be  ex¬ 
panded  by  decreasing  the  level  of  probability  level,  a. 

For  a  distribution-free  .ethod,  the  same  goal  can  be  at¬ 
tained  by  repeating  the  construction  procedure  at  lower 
levels  of  a  and/or  B,  with  reduced  sample  sizes.  If  re¬ 
sampling  is  not  possible,  then  one  must  work  with  randomly 
chosen  subsets  of  the  original  sample.  While  this  does  not 
guarantee  an  expanded  set,  the  only  alternative  is  to  take 
additional  cuts  on  the  original  tolerance  region.  This  is 
not  recommended  since  the  resulting  confidence  level  is 
dependent  upon  a  fixed  but  unknown  quantity. 
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There  are  several  possible  extensions  to  the  work  pre¬ 
sented  in  this  paper.  Certainly  there  is  a  need  for  more 
experimentation  with  distribution-free  chance-constrained 
sets  using  simulated  data  from  distributions  other  than 
gamma.  Perhaps  an  even  better  insight  into  the  usefulness 
of  these  sets  could  be  derived  by  applying  them  to  real- 
world  linear  programming  problems  with  random  coefficients. 

Further  research  is  needed  in  determining  appropriate 
values  for  the  elements  of  the  shaping  vector  d  for  the 
set  Sg(a,8).  This  problem  was  investigated  briefly  in 
Chapter  IV,  where  it  was  shown  that  the  choice  of  sample 
means  (as  opposed  to  d  =  0)  resulted  in  a  set  S' (a, 8)  whose 
shape  was  very  close  to  that  of  the  true  chance -cons trained 
set. 

The  notion  of  a  distribution-free  tolerance  region 
might  prove  to  be  beneficial  in  other  areas  of  stochastic 
linear  programming.  For  example,  in  distribution  problems, 
the  distribution  of  the  optimal  objective  function  value  is 
derived  explicitly  or  by  numerical  approximation,  then  de¬ 
cision  rules  are  based  on  features  of  the  distribution. 

The  alternative  distribution-free  approach  would  be  to  base 
decision  rules  on  distribution-free  tolerance  limits. 


APPENDIX 

PROOF  OF  THEOREM  3.3 


Theorem  3.3 

Let  Um  be  the  coverage  of  the  region  T(ct,0)  constructed 
from  a  sample  of  size  N  by  removing  m  blocks.  Let  Um+m  be 
the  coverage  of  the  region  T' (a',B')  formed  by  removing  m' 
additional  blocks  from  T(a,$).  Then 

Pr[U,n+In,  >  a]  =  1  -  Ia,/ljm  (N-m-m'+l,  in')  -  3* 


Proof 

Let  Uj  be  the  coverage  of  Ojfas  defined  by  (2.3)  in 
Chapter  II].  Assign  zero  probability  to  0^^  and  normalize 
to  unity  the  portion  of  the  original  population  contained 
in  0^.  Let  be  the  conditional  coverage  of  0^  given  U^. 
Continuing  in  this  manner,  a  sequence  of  conditional  cover 
ages  U^,  is  obtained  for  which  the  probability 

element  (p.e.)  can  be  shown  [12]  to  be 

N!(l  -  Ux)(l  *  U*)N'2  ...  (1  -  U^)N“N  dUj  dU£  ...  dUjJ 


In  particular,  the  p.e.  of  the  distribution  of  the  condi¬ 
tional  coverages  U* U’  is 

m*  l  n 
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Now  the  coverages  u, ",JN  of  th°  bloclts  ’  N 

are  related'to  the  rhove  conditional  coverages  in  the  follow 

ing  manner. 


«in  *  Vi/U 


u  •  <W(U"  -  w 


UN  ■  vcu"  '  Vl  •  *W  '  •••  ■  u"-l) 

The  Jacobian  of  this  transformation  is 

cu”)N'B  a  - 

and  the  corresponding  p.d.f.  of  the  coverages 


.N-m+1  ...  (1  -  OJ.p 


N-m 


m 


(N-bQ.L.  dU  ...  dUN  ,  2  um+i  5  U 

(yjnprm  a+1  N  i»l 

0  otherwise 

Making  thv  transformation  Um+i  u  Yi,  i  **  1»*  •  •  »N 
the  p.d.f.  of  the  random  variables,  V1,...,VN.m  is 


-  m, 


(N  -  m)!  dV1  ...  dVN_ra  , 
0 


N-m 

l  V.  <  1 
i-1  1  ~ 

otherwise 


which  is  the  (N-m) -variate  Dirichlct.  % 

The  coverage  UTO+m’  can  now  be  expressed  as  Um  m 
U*  .  U*'  where  is  the  sum  the  m*  additional  coverages 
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removed  from  T(a,B).  It  follows  that 
-  • 

Pr[Um  m  £  a']  =  Pr[Um'  <  Jm  -  a’] 

»  Pr [Vm '  <  1  -  (a’/U1")] 
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